Measuring contrast processing in the visual system using the steady state visually evoked potential (SSVEP)
Department of Psychology and York Biomedical Research Institute, University of York, UK
Funding statement:
This work was supported by BBSRC grant BB/V007580/1 to DHB and ARW
# Measuring contrast processing in the visual system using SSVEPAbstract:
Contrast is the currency of the early visual system. Measuring the way that the computations underlying contrast processing depend on factors such as spatial and temporal frequency, age, clinical conditions, eccentricity, chromaticity and the presence of other stimuli has been a focus of vision science for over a century. One of the most productive experimental approaches in this field has been the use of the ‘steady-state visually-evoked potential’ (SSVEP): a technique where contrast modulating inputs are ‘frequency tagged’ (presented at well-defined frequencies and phases) and the electrical signals that they generate in the brain are analyzed in the temporal frequency domain. SSVEPs have several advantages over conventional measures of visually-evoked responses: they have relatively unambiguous ouput measures, a high signal to noise ratio (SNR), and they allow us to analyze interactions between stimulus components using a convenient mathematical framework. Here we describe how SSVEPs have been used to study visual contrast over the past 70 years. Because our thinking about SSVEPs is well-described by simple mathematical models, we embed code that illustrates key steps in the modelling and analysis. This paper can therefore be used both as a review of the use of SSVEP in measuring human contrast processing, and as an interactive learning aid.
Keywords:
- EEG
- VEP
- SSVEP
- Vision
- Contrast
Introduction
Neurons in the visual areas of the brain are primarily responsive to changes in cone photoreceptor activations across time and space. This property, referred to as ‘contrast’, sets the fundamental limits of our visual abilities, which remain steady over a remarkably wide range of environmental light levels. The human response to contrast can be studied using many different techniques. Early work used psychophysical methods to measure contrast sensitivity (Campbell and Green, 1965; Schade, 1956), defined as the inverse of the lowest contrast that can be reliably detected. But neural responses can also be measured more directly using techniques such as functional magnetic resonance imaging (fMRI), magnetoencephalography (MEG), and electroencephalography (EEG). Here we will describe how an EEG method known as the steady state visually evoked potential (SSVEP) technique has contributed to our understanding of human contrast processing in health, disease and throughout development.
The SSVEP is a continuous electrical response evoked in the brain by visual stimuli flickering at a constant frequency (D. Regan, 1966a). For contrast-defined stimuli, such as sine-wave gratings, it is strongest at the occipital pole, adjacent to the early visual areas that generate the signal, although careful analysis of individual visually-evoked potentials (VEPs) reveals multiple generators throughout visual cortex (Di Russo et al., 2007, 2005). The flickering stimulus entrains neural population responses at multiples of the stimulus frequency, so continuous EEG data are typically analysed by taking the Fourier transform, and estimating the amplitude at these frequencies.
Two common stimulus variants involve sinusoidal on-off (or ‘appearance/disappearance’) flicker, where the stimulus alternates between a blank background and the peak contrast, and sinusoidal counterphase flicker, where the stimulus alternates in phase (i.e the black regions become white and the white regions become black). On-off flicker can excite independent populations of on- and off-cells in the visual system once per cycle. For spatial patterns, the contributions of individual cortical cells to these excitations are thought to sum at the scalp and generate an average of their onset responses on each cycle. For very low spatial frequencies (including zero), resposes at the scalp will be dominated by a single on- or off- cell type on each half cycle.
If the amplitudes and waveforms of responses to the stimulus appearance and disappearnce are not perfectly balanced, a response at the fundamental flicker frequency, known as 1F, and its integer harmonics can be produced. For stimuli of any spatial frequency (including zero), an imbalance can be caused by differences in either the onset- and offset- responses of either on- or off-cells. For example, if only on-sensitive neurons were present (firing to the onset, but not the offset of local increases in luminance) the response to the on/off (or ‘appearance/disappearance’) presentation of a grating of any spatial frequency below the resolution limit would consist entirely of odd harmonics.
By comparison, counterphase flickering patterns generate two essentially identical transients per cycle and therefore do not produce a response at 1F, only even harmonics: 2F, 4F, 6F and so on. Because square-waves are spectrally broad-band, square wave flicker (either on/off or contrast reversing) tends to produce additional spectral harmonics compared to sine-wave flicker.
The higher harmonics of the steady-state signal are generally thought to reflect nonlinear processing in the visual system (Regan and Regan, 1988). Because they can arise from different neuronal computations, different populations and even different stages of the visual system, different harmonics can exhibit different input-output functions. For example, 1F and 2F responses can have different thresholds (Bobak et al., 1984) and scalp topographies (Regan, 1973a) while Kaestner et al. (2024) show both different response slopes and thresholds for 1F and 2F components generated by dynamic noise.
SSVEP signals can also be elicited by periodic changes of stimulus properties other than achromatic and chromatic contrast, such as motion, stereo depth, and facial identity or expression (see Norcia et al., 2015, for an overview); however our focus here is on the contrast response.
Why measure responses to contrast?
Contrast is one of the most fundamental pieces of information that the eye transmits to the brain. It can be defined as the change in cone photoreceptor activity over space (‘spatial contrast’) or time (‘temporal contrast’). Cone photoreceptors - which drive precortical opponent pathways - contribute to both chromatic and achromatic contrast, and although most of the research we describe here focuses on achromatic contrast, SSVEPs have proven to be an excellent measure of early chromatic processing as well (F. Di Russo et al., 2001; McKeefry et al., 1996; Regan, 1975, 1973a, 1966b) (see also Baseler and Sutter, 1997).
Contrast is typically specified as the percentage deviation of a uniform stimulus from the background. So, for example, a disk of 100 units of cone activation (\(I_{\mathrm{stim}}\)) surrounded by a ‘background’ of 50 units of activation (\(I_{\mathrm{background}}\)) has a contrast of \(\frac{I_{\mathrm{stim}} - I_{\mathrm{background}}}{I_{\mathrm{background}}}\) = 100%. Where patterns are more complex (for example, the sine-wave gratings or Gabor patches common in vision science), the Michelson (1927) definition of contrast is specified by the maximum and minimum excursions from the mean:
\[\frac {I_{\mathrm{stimmax} }-I_{\mathrm{stimmin} }}{I_{\mathrm{stimmax} }+I_{\mathrm{stimmin} }}. \tag{1}\]
These contrast definitions are appropriate both to photometric measures of stimulus contrast (for example, luminance; Lennie et al. (1993)) and also to definitions based on cone excitations (Derrington et al., 1984; MacLeod and Boynton, 1979) which are more common in work on chromatic processing.
Although its mathematical definition is straightforward, the computations that underlie contrast processing in the brain have been the subject of intense research for many decades. The neural code for contrast, even in the earliest parts of visual cortex, is not simply a linear transform of the contrast at the retina - instead, contrast signals undergo a cascade of nonlinear processing stages that, broadly, attempt to normalise the output relative to the spatiotemporal environment. This normalization, achieved through a computation called ‘contrast gain control’ (Bobak et al., 1988; Carandini and Heeger, 2011; Foley, 1994; Heeger, 1992; Ohzawa et al., 1985, 1982) maximises the sensitivity of the visual system by making optimal use of neuronal bandwidth. As an example, a grating placed at the centre of a low-contrast background typically appears more intense than the same grating when superimposed on a high contrast background (see Figure 1; note that the code used to produce all figures in this review is available in python and R at: https://github.com/wadelab/contrastReviewPaper2025).
A significant body of research into contrast processing is concerned with how these normalization mechanisms depend on colour (Chen et al., 2000), orientation (Foley, 1994), eye of origin (Baker et al., 2007; Legge, 1979), spatial and temporal frequency (Meese and Baker, 2009), location (Petrov et al., 2005; Polat and Sagi, 1993; Tadin et al., 2003), age (Betts et al., 2005), and the presence of neurological disorders (Porciatti et al., 2000; Tsai et al., 2011). The SSVEP has proven to be invaluable in this research because it provides an objective readout of contrast representation at different stages of the visual system, and allows us to ‘tag’ the probe and background at separate frequencies.
Because it provides a direct read-out of neural population activity, the SSVEP signal can reveal key features of neural signal transduction. For example, by varying the peak stimulus contrast parametrically, a ‘contrast vs response’ function (CRF) can be measured - where the ‘response’ is typically defined as the amplitude of the SSVEP frequency component at the stimulus frequency, or a low multiple thereof. This corresponds closely to similar functions reported by studies measuring single unit activity or local field potentials in the cortex (Morrone et al., 1982; Shapley and Victor, 1980). However the SSVEP has the advantage that it is non-invasive, and so can be measured in awake, behaving human participants.
Although the SSVEP does seem to reflect the activity of relatively well-tuned neuronal populations, changing some aspect of the stimulus may change the nature of those populations. For example, Campbell and Maffei (1970) noted that densely-sampled measurements of contrast responses can reveal the presence of two qualitatively different types of neurons that exhibit different log-linear contrast functions (Nakayama and Mackeben, 1982; Norcia et al., 1989; Souza et al., 2007; see also Tyler and Apkarian, 1985). Modeling such two-limbed contrast response functions using a single sigmoidal function is therefore an approximation.
To understand the utility of the contrast SSVEP, it is helpful to identify the cascade of processing stages in the early visual system that give rise to it. In the following section we illustrate how a typical SSVEP signal measured over early visual cortex might contain information about a large number of early visual computations.
Contrast processing - linear and nonlinear
Neurons have a limited dynamic range, yet they can transmit information about visual stimuli that span many orders of magnitude. In the domain of contrast, to some extent this is accomplished at a population level - individual neurons typically implement a non-linear, sigmoidal CRF transducer (Albrecht and Hamilton, 1982; Tolhurst et al., 1981) and different neurons exhibit peak sensitivity (defined as the maximum slope of the function) at different contrast levels (Busse et al., 2009; Carandini et al., 1998; Carandini and Heeger, 1994). A neuronal population will therefore span a sensitivity range greater than any individual member.
Individual neurons at multiple stages of the visual hierarchy also change their sensitivity depending on the average spatiotemporal contrast energy of their environment. This “normalisation” process is dynamic and nonlinear and is well-modeled by a hyperbolic ratio function in which the response of each neuron is modulated by a local ‘gain pool’ composed of the summed responses of the local neuronal population (Baker and Wade, 2017; Busse et al., 2009; Carandini and Heeger, 2011; Heeger, 1992).
An additional complexity is introduced by the fact that the EEG is an average population response and that the visual system contains many different types of neurons. As a stimulus changes in (for example) contrast, it may selectively activate qualitatively different neuronal populations.
To understand these processes better, we will show how sinusoidal input signals might be processed by the visual system to produce SSVEPs. Figure 2 illustrates how sine waves of different contrasts are processed in a linear system. The first panel shows the input sine wave, which would be used to modulate stimulus amplitude over time. Notice that there are five peaks in the waveform during the one second sample, so the stimulation frequency is 5Hz (F1). The second panel shows the Fourier transform of the waveform, which contains a substantial peak at this frequency. If we change the stimulus contrast (i.e. the amplitude of the waveform), the amplitude of the F1 component increases linearly with contrast (right panel).
Next we can consider the impact of two types of nonlinearity in processing on the responses.
First, we consider the fact that neurons in the early visual system code either positive or negative contrast. The response of each different type of neuron is an increase in firing rate to the increasing levels of the preferred contrast polarity, and these neurons typically exhibit zero (or baseline) response to the non-preferred polarity. We can model the combined responses of these two cell populations to a time varying contrast signal by full-wave-rectification. The effect of this computation is, effectively, to double the input frequency in the population measurement and to introduce additional higher (even) harmonics due to the discontinuity at the contrast reversals. The effect of full-wave rectification on the SSVEP signal is shown in Figure 3
The contrast response function (CRF) is still linear for this model - rectification has doubled the frequency (and introduced higher harmonics) but the response amplitude measured by the second harmonic is still directly proportional to the input amplitude.
Neurons in the visual system typically do not exhibit this perfect response linearity. Instead, neuronal responses are well-modeled by some form of saturating non-linearity: as the contrast increases, the response of the neuron increases but the response increase per unit contrast is reduced at the high end of the contrast range. This saturating nonlinearity is often modeled as a hyperbolic ratio function of the form:
\[ R=R_{\mathrm{max}} \cdot \frac{C_{\mathrm{in}}^n}{C_{50}^n + C_{\mathrm{in}}^n}, \tag{2}\]
where \({R_{\mathrm{max}}}\) describes the maximum response level, \(C_{\mathrm{in}}\) is the input contrast (or the time-varying waveform), \(C_{50}\) is the ‘semi-saturation constant’ (the point at which the response is at half-maximum) and n controls the steepness of the curve (with a typical value around \(n=2\).
This nonlinearity has clear effects on signal transduction. In Figure 4 the full-wave rectified waveform (which is combining the response of both ‘on’ and ‘off’ neurons) passes through the saturating nonlinearity (left panel of Figure 4). The frequency doubling seen in Figure 3 is still present but the the contrast response function is now nonlinear (right panel of Figure 4).
Interestingly, although the hyperbolic ratio function (Equation 2) is monotonic, the CRF resulting from measuring the amplitude of the second harmonic (2F) component contains a slight roll-off at high input contrasts. This results from the distortion of the input sine waves at high contrast due to a combination of the full-wave rectification and saturating non-linearity. Power at other harmonics increases, and the total power generated by the input is monotonic. This roll-off is often seen in experimental data and has been referred to as ‘supersaturation’ (Peirce, 2007; Tyler and Apkarian, 1985).
To illustrate the effect of contrast gain control (Heeger, 1992; Ohzawa et al., 1982), we next include a second component (a ‘mask’) that contributes to the gain pool of the first (the ‘target’). The mask stimulus will suppress the response at the target frequency, reducing its amplitude. The suppression is reciprocal - activity at the mask frequency is also reduced by the presence of the target, in a contrast-dependent manner (Brown et al., 1999; Busse et al., 2009; Candy et al., 2001; Regan and Regan, 1988). At a single pair of (matched) contrast levels, we see a complex pattern of intermodulation terms in the transduced waveforms (left panel of Figure 5) and in the Fourier spectrum (middle panel of Figure 5). The two components interact generating nonlinear intermodulation terms at sums and differences of the input frequencies (F1, F2). The nature of the interaction — and therefore the pattern of intermodulation terms — is determined by the computations happening as the contrast signal moves from retina to cortex (Regan and Regan, 1988).
If the two inputs were simply added together, the representation of the resulting signal in the Fourier domain would be the linear sum of the two independent signals (i.e. peaks at F1 and F2). However, it is important to take physiology into account. Contrast gain control reduces the amplitude of the responses via a suppressive process, which can be incorporated into our transducer function via an additional denominator term:
\[ R=R_{\mathrm{max}} \cdot \frac{C_{\mathrm{in}}^n}{C_{50}^n + C_{\mathrm{in}}^n + C_{\mathrm{mask}}^n}, \tag{3}\]
where \(C_\mathrm{mask}\) reflects the contrast of the mask component at a frequency distinct from that of the target. The effect of this extra term is to reduce the target response (see right panel of Figure 5). For a linear contrast axis, the contrast response function becomes shallower, whereas on a logarithmic contrast axis it maintains its steepness and shifts to the right. Suppressive effects of this kind have been obtained using SSVEP with a variety of different types of mask, including orthogonal overlaid masks, surround masks, and dichoptic masks (Burr and Morrone, 1987; Busse et al., 2009; Candy et al., 2001; Cunningham et al., 2017; Ross and Speed, 1991; Salelkar and Ray, 2020; for a meta-analysis see Baker et al., 2021).
Even without considering a spatial component, the early visual system is far more complex than the model here suggests. For example as well as cells that code positive or negative contrast in a more or less continuous manner, the retina also contains ‘transient’ cells that code temporal changes in contrast. These cells (Alpern, 1971; Kuffler et al., 1957), and cells with similar properties in the lateral geniculate nucleus (LGN) (Levitt et al., 2001) and cortex (Hubel and Wiesel, 1959; Movshon, 1975), will introduce second harmonic components even when the stimulus itself is modulated in an on-off fashion (McKeefry et al., 1996). Analogously, in the spatial domain, so-called ‘simple cells’ are sensitive to the polarity of a spatial contrast modulation while ‘complex cells’ respond to the presence of patterned spatial contrast irrespective of its spatial position (Hubel and Wiesel, 1962). Both the amplitude and the phase of the SSVEP response to a single contrast-reversing sine-wave grating therefore contain information about nonlinear computations performed across a range of retinal and cortical cell types.
The complexity of even a simple simulation of the frequency-domain signal arising from non-linear interactions is intriguing. Presumably, the signal measured from early visual cortex is the result of a cascade of nonlinear retinal and cortical operations up to that point. It therefore contains a ‘signature’ or ‘fingerprint’ of the computational nature, order and parameters of those operations - including the shape of the transducer functions and the computations involved in signal combination. In principle, that information could be recovered from the SSVEP signal - a possibility recognised in the early days of the technique (Regan and Regan, 1988). Although characterising the complete set of computations along the entire processing pathway is challenging, and a full system identification may require the use of multiple different input frequencies (Boyd et al., 1983), careful parametric variation of the input stimuli does allow us to fit models of early visual visual processing (e.g. Zemon and Gordon, 2006) and, by incorporating mathematical models of gain control mechanisms with either instantaneous, or temporally-integrated gain control, it is possible to model gain control using SSVEP data. For example, Tsai et al. (2012) demonstrated that a relatively simple gain control model gave a good account of the pattern of intermodulation responses produced by two overlaid patterns flickering at different frequencies. This was achieved by passing full stimulus waveforms through the transducer nonlinearity, and calculating the Fourier spectrum of the model output. More recent work has used this approach to study binocular integration in both normal subjects and amblyopes (Hou et al., 2020; Hou 侯川 et al., 2021). Our more recent work on signal combination across eyes and space similarly demonstrated close correspondence between the predictions of a computational model and empirical data in humans (Baker and Wade, 2017). More detailed modelling of intracortical recordings (Groen et al., 2022) has revealed details of the timecourse of gain control effects, specifically that normalization is delayed slightly relative to the initial visual response.
Similar changes to the contrast response function might also be obtained using adaptation paradigms, in which the visual system is exposed to high contrast stimuli for long durations. Psychophysically, adaptation increases detection thresholds, but has little effect on contrast discrimination performance (Ross et al., 1993), much like pattern masks (Foley, 1994). Although SSVEP adaptation effects show strong tuning for orientation (Campbell and Maffei, 1970; Vergeer et al., 2018) and spatial frequency (Mecacci and Spinelli, 1976), there appear to be very few measurements of the full SSVEP contrast response function before and after an extended period of adaptation. One exception is a study by Bach et al. (1988) that examined the effects of adaptation on SSVEP responses, and found a paradoxical increase in the 1F SSVEP amplitude following prolonged exposure to high-contrast stimuli across much of the probe contrast range. They explain this as being due to the presence of a nonlinear transducer function that saturates at high probe contrasts, driving the response to higher harmonic frequencies. Adaptation reduces the apparent contrast of the input, leading to less distortion, a more sinusoidal output and a concentration of the frequency-tagged response at lower harmonics. This explanation is conceptually related to the ‘supersaturation’ effect that we observed in Figure 4.
Measuring the development of contrast processing
An early use of the SSVEP was to provide an objective estimate of spatial contrast sensitivity in infants, without requiring behavioural responses. In well-motivated adults, psychophysical measurements of contrast sensitivity remain the gold standard. However, it is difficult and time consuming to obtain reliable psychophysical data from infants. In these cases, SSVEP measurements represent a fast and efficient method for measuring low-level visual responses (Braddick et al., 1986; Norcia et al., 1990; Tyler et al., 1979) and the high SNR of SSVEP means that infants need only look at the screen for short periods of time.
Because SSVEP responses at detection threshold are very small, estimating a threshold is achieved by measuring the contrast response function at relatively high levels (in the linear part of the log-contrast response function prior to saturation), and extrapolating back along the function (either contrast vs response measured at a constant spatial frequency or spatial frequency vs response at a constant contrast level) to estimate its intercept with the x-axis (Campbell and Maffei, 1970) (see Figure 6). This contrast level has been shown to correspond approximately with psychophysically measured detection thresholds (Campbell and Kulikowski, 1972; Norcia et al., 1986). One factor affecting the accuracy of this technique is the presence of noise in the EEG signal which will increase the apparent amplitude of the SSVEP responses across the measurement range. The effect of this noise is particulary apparent for recordings with low SNR or where the slope of the CRF is shallow (and therefore where small changes in the vertical offset of the fit result in large changes in the x-axis intercept). Accurate estimates of thresholds therefore require the experimenter to model this noise and compensate for its effects, as detailed in Norcia et al. (1989).
A robust estimate of the threshold therefore requires the measurement of the SSVEP amplitude at many different super-threshold contrast levels. This was made faster by the development of the ‘sweep VEP’ paradigm in which the stimulus changed its contrast, spatial frequency or some other property, throughout a trial (Regan, 1973b; Tyler et al., 1979). To avoid hysteresis or short-term adaptation effects, the sweep is sometimes conducted both up and down in the same experiment (Norcia et al., 1990; Norcia and Tyler, 1985a; Tyler et al., 1979). The sweep VEP (really, a ‘sweep SSVEP’) technique is now commonly-used to obtain a rapid and objective measurement of visual acuity. In particular, because of its relative speed and simplicity, this technique has now become a standard for conducting tests of visual acuity in very young subjects or where behavioural tests are not appropriate (Bach et al., 2008; Bach and Farmer, 2020; Hoffmann et al., 2017; Ridder, 2004).
The approach of estimating visual thresholds by extrapolating SSVEP responses at higher stimulus levels has revealed much about the development of visual abilities in infants (Atkinson et al., 1979; Braddick et al., 1986; Harris et al., 1976). In general, SSVEP measurements of infant vision have revealed that contrast sensitivity and acuity for both achromatic and chromatic stimuli develops earlier than had been supposed previously based on behavioural readouts (Dobson et al., 1978; Norcia and Tyler, 1985b). For example, using SSVEPs, Norcia and Tyler (1985b) found acuity thresholds in one month old infants that were roughly half those estimated previously from behavioural experiments (reviewed in Brown, 1990). The ability to make very sensitive measurements of early visual function in infants also allows researchers to monitor the trajectory of visual development and identify different temporal epochs of developmental change (Norcia et al., 1990).
SSVEP measures of thresholds for detecting purely chromatic stimuli find responses to isoluminant stimuli by around 5-6 weeks postnatally, with contrast detection reaching near-adult levels by around six months (Morrone et al., 1993). Spatial acuity as measured by SSVEP reaches adult levels more slowly, but near-adult levels are recorded around one year (Hamer et al., 1989; Norcia and Tyler, 1985b) compared to around six to seven years with behavioural measures (Atkinson and Braddick, 1983; Ellemberg et al., 1999).
At least some of this difference is likely due to the relative objectivity and high SNR of the SSVEP technique compared to other methods such as preferential looking, which requires careful measurement of the infant’s gaze direction. In slightly older children, other groups have reported electrophysiological correlates of visual acuity that more closely match the behavioural measures (De Vries-Khoe and Spekreijse, 1982).
The SSVEP technique has also been used to study the development of the contrast gain control mechanisms described in the previous section. Although response gain (Morrone and Burr, 1986) or contrast gain control (Candy et al., 2001; Skoczenski and Norcia, 1998) is measurable in infants as young as six weeks old, its development appears to be slower, with adult levels being reached at approximately 11 years (Pei et al., 2017).
Contrast processing in clinical conditions
The SSVEP technique has also been used to study clinical conditions, such as diseases and developmental disorders. This can often be informative regarding the underlying mechanism that characterises the condition. Here we focus on four conditions, but there is potential to apply the method more broadly: as a dignostic technique, to monitor disease severity and progression, or to assess the efficacy of treatments.
Epilepsy is a neurological condition in which patients experience seizures - episodes of uncontrolled neural activity that can cause unconsciousness, involuntary movements and convulsions, and atypical sensory experiences. Porciatti et al. (2000) showed that individuals with photosensitive epilepsy generate larger steady-state signals in response to flickering visual stimuli, and their contrast response functions saturate less than those of healthy controls. This is consistent with the idea that epilepsy involves a cortical hyperexciteability that makes seizures more likely. It is also the case for individuals with idiopathic generalised epilepsy (Tsai et al., 2011), a subtype of epilepsy that has a less obvious link to vision. The differences apply across the whole contrast-response function, and so resemble a response gain effect (see Figure 7a), which might be due to reduced inhibition from neighbouring neurons. Differences in SSVEP amplitudes have also been reported in individuals with migraine (Shibata et al., 2008), a condition also associated with cortical hyperexciteability.
Amblyopia is a disorder of binocular vision, characterised by one eye contributing much less to perception than the other. This is often due to strabismus (squint) or anisometropia (difference in optical prescription between the eyes) during development. Contemporary accounts suggest that the amblyopic eye is suppressed by signals from the fellow eye. SSVEPs provide a convenient and objective method to characterise the difference in neural response to signals in each eye, and typically show reduced responses to stimuli in the amblyopic eye (see Figure 7b) across the contrast range (Baker et al., 2015; Lygo et al., 2021). Measurements of interocular contrast modulation (Hou 侯川 et al., 2021; for example, Norcia et al., 2000) can also provide information about the way that amblyopia changes the balance of excitation and inhibition between eyes. There are currently many novel binocular treatments for amblyopia under development, often involving virtual reality or stereo display systems designed to encourage the two eyes to work together. The steady-state approach may be more sensitive and objective than typical acuity measurements, and also has the potential to measure suppression between the eyes directly (Du et al., 2023; Hu et al., 2023; e.g. Zheng et al., 2019).
Autism is a condition often associated with differences in vision (Simmons et al., 2009) and other senses (MacLennan et al., 2022). Pei et al. (2014) used a sweep-VEP method with counterphase flickering stimuli, and found weaker responses in autistic children at spatial frequencies around 8c/deg, compared with age-matched controls. This was subsequently extended to measurements of contrast sensitivity in a further pediatric sample by Vilidaite et al. (2018) (see Figure 7c), who additionally found weaker responses in autistic adults at the second harmonic (using on/off flicker). Interestingly this study replicated its key findings in a Drosophila genetic model of autism (Nhe3 mutations), illustrating the translational potential of the steady-state approach, as well as identifying a possible biomarker for autism.
Recent work on understanding Parkinson’s disease has also used Drosophila genetic models. Afsari et al. (2014) found that mutant flies produced stronger SSVEP responses to flickering lights than control flies (see Figure 7d). The authors theorised that differences in early gain control during development might lead to visual deficits later in life. Although visual responses are a convenient assay of neural function, it is likely that the same general process applies throughout the whole brain, including in the motor system where the core Parkinson’s symptoms (tremor, rigidity, slow movement) manifest. The SSVEP differences were reduced by a kinase inhibitor that targets the dopamine system, demonstrating how model organisms can be used to test new pharmacological treatments. SSVEP responses also provide a potential method to diagnose Parkinson’s before any symptoms manifest, and to monitor the effect of treatments.
Attention and arousal
Attention exerts a profound influence on visual performance: for example, instructing people to attend covertly to a spatial location improves their performance on a target detection task significantly (Bashinski and Bacharach, 1980; Cameron et al., 2002; Carrasco et al., 2000; Morrone et al., 2004; Pestilli et al., 2009; Posner, 1980). In principle, this enhancement may be driven both by modulation of the underlying signal or noise characteristics, or by additional decision-theoretic factors such as reduction in spatial uncertainty (Gould et al., 2007; Petrov et al., 2006). Early experiments with non-human primates showed little evidence for attentionally-driven changes in neuronal spike rates (Luck et al., 1997; Marcus and Van Essen, 2002; McAdams and Maunsell, 1999; Mehta et al., 2000a, 2000b), but with the advent of spatially-resolved human brain imaging methods in the late 1990s it became apparent that spatial attention was linked to frank changes in both fMRI (Brefczynski and DeYoe, 1999; Buracas and Boynton, 2007; Gandhi et al., 1999; Kastner et al., 1999; Li et al., 2008; Murray, 2008; Silver et al., 2007; Somers et al., 1999; Tootell et al., 1998) and EEG signals (Ding et al., 2006; Morgan et al., 1996; Müller et al., 1998; Müller and Hillyard, 2000).
Electrophysiological and psychophysical measurements of the effect of attention on both luminance and chromatic contrast have strongly implicated gain control as an underlying mechanism (Francesco Di Russo et al., 2001; Di Russo and Spinelli, 1999a; Lu and Dosher, 1998). These effects also appear to differ between chromatic and achromatic pathways (Di Russo and Spinelli, 1999b) - perhaps as a result of the different levels of nonlinear gain control in the early pre-cortical magno-, parvo- and konio-cellular pathways (Derrington and Lennie, 1984; Kaplan and Shapley, 1986; Lee et al., 1990; Solomon and Lennie, 2005). In the late 2000s a comprehensive theoretical model for attentional modulation was developed that framed it as a contrast gain control computation (Boynton, 2009; Reynolds and Heeger, 2009). This framework has proven to be influential - explaining a wide range of phenomena from the earlier literature and demonstrating subtle interactions between the size of the attentional ‘spotlight’ and the stimulus configuration which rationalise many apparent contradictions in the literature - in particular reports of a response- rather than a contrast-gain control phenomenon. Direct measurements of attentional modulation of achromatic SSVEP signals are broadly consistent with this model (Hou et al., 2016; Lauritzen et al., 2010; Martinovic and Andersen, 2018) and confirm the relatively weaker role of spatial attention on responses driven by chromatic stimuli - particularly those that isolate the opponent S-(L+M) cone pathway (Highsmith and Crognale, 2010; Wang and Wade, 2011). The gain control model of attention can also be extended to SSVEP studies of feature-based attention which show that the modulatory effects can be targeted to the most informative neuronal populations (Verghese et al., 2012).
The SSVEP can also be used to study changes in visual processing by different behavioural states or overall arousal. For example, locomotion has been shown to alter neuronal excitability and spatial normalization in mice (Ayaz et al., 2013; Niell and Stryker, 2010) - running mice have higher visual sensitivity and lower surround suppression compared to stationary mice. Although measuring EEG responses from locomoting humans is technically challenging, SSVEP studies (which are able to distinguish broadband noise from input signal effectively) have shown that walking also alters early visual processing, although in a manner different to that observed in mice (Benjamin et al., 2018; Cao and Händel, 2019).
SSVEP measurements are typically used to measure time-invariant responses due to sustained attention. However, recent work has shown that moderately high modulation frequencies (ca 10Hz) and short analysis windows (1.5s) can also be used to track the dynamic allocation of attention across a task (Chota et al., 2024) or attention to moving targets (Lissa et al., 2020). The ability to track changes in attention over short time periods is also important if SSVEP is to be used for dynamic readouts - for example in a Brain Computer Interface.
Brain-computer interfaces
One widespread recent application of the SSVEP technique is in the design of brain-computer interfaces (BCIs), which seek to control some aspect of a computer using neural signals. The high SNR and precise frequency resolution of SSVEPs are advantages for this approach. Typical studies may involve presenting an array of stimuli at different flicker frequencies, and having the participant select one either by overt attention (i.e. shifting fixation to foveate the selected stimulus) or covert attention (i.e. deploying attention to one stimulus whilst keeping fixated) (Middendorf et al., 2000). Because SSVEP signals are highly sensitive to both visual field position (Ales et al., 2010; Di Russo et al., 2007) and attentional state (Lauritzen et al., 2010; Morgan et al., 1996; Müller et al., 1998; Verghese et al., 2012), the response to the selected stimulus will typically increase relative to the others, allowing it to be identified by an on-line algorithm. Because more than one stimulus frequency is generally present, this modulation and the associated changes in gain control will affect the entire pattern of self- and intermodulation terms, allowing the choice to be decoded by a multivariate pattern classifier.
This approach is primarily useful in situations that require the BCI to distinguish from among a small set of possibilties: for example, in early work, visual stimuli representing ‘Left’ and ‘Right’ commands in a flight simulator were distinguished robustly (Middendorf et al., 2000). Although SSVEP-based BCI interfaces typically do use contrast flicker, work in this field has largely focused on optimising the stimuli or decoders to increase decoding performance and reducing visual fatigue associated with the long-term presentation of arrays of high-contrast flicker (Diez et al., 2024), rather than studying visual contrast processing . We therefore note in passing that using SSVEPs to improve our understanding of early contrast processing may yield benefits to this related field.
Future directions
The SSVEP is a powerful tool for studying contrast processing. It provides a high SNR readout of neuronal activity that is unambigously linked to the input. It is sensitive to both the amplitude and phase of the input and when combined with source imaging, it can be extracted from different cortical regions allowing researchers to track contrast processing computations across the visual pathway. Because of the high SNR, it can be measured in subjects where long recording durations are impractical (for example, infants or patients with neurological disorders) and at frequencies high enough to be effectively invisible (Herrmann, 2001; Minarik et al., 2023; Seijdel et al., 2023). The SSVEP is also able to ‘fingerprint’ modulators of the inputs through the harmonic and intermodulation terms they generate in the output. In principle, each nonlinearity in the visual pathway can be identified by its contribution to the frequency spectrum at different recording locations (Regan and Regan, 1988). This, in turn, allows researchers to study how contrast processing depends on spatial and temporal context, as well as changes in task, cognitive and behavioural state, and arousal.
Although visual neuroscience is a relatively old subfield, there are still outstanding questions relating to contrast processing that could be addressed by SSVEP methods. First, it is still not completely clear how contrast signals are computed in the human retina. Although we have more than a century of electrophysiological data from animals, and the broad structure of cone inputs to retinal ganglion cells is understood (Li et al., 2014), we are still discovering new aspects of retinal processing that could influence the ‘coining’ of the visual system’s currency (Gollisch and Meister, 2010; Uprety et al., 2022; Wang et al., 2023). The ability to frequency tag both inputs (for example, cone-directed luminance contrast) and modulators (for example, stimuli that selectively drive the intrinsically-photosensitive retinal ganglion cells) is a powerful tool to explore this first stage of image generation.
Later in the visual pathway, we would like to know more about the role of corticothalamic feedback in the LGN. Although it is commonly thought of as a simple relay between the eye and the cortex, the majority of inputs to the LGN come from cortex rather than the eye, and there is good evidence that contrast processing in the LGN can be altered by top-down signals including attention (Briggs and Usrey, 2007; Gouws et al., 2014; O’Connor et al., 2002; Sherman and Guillery, 1996). Bottom-up inputs to the LGN are segregated by eye and so responses there are often considered to be purely monocular, but it is possible that feedback signals allow some binocular computations such as interocular normalisation (Baker et al., 2007; Dougherty et al., 2019) to begin even at this relatively early stage. Frequency tagging inputs to different eyes or in different precortical pathways allows us to address neurons in different parts of the LGN. These techniques combined with advances in recording technology (for example, sensitive source-imaged EEG and MEG recordings that can resolve subcortical structures (Attal et al., 2012; Attal and Schwartz, 2013; Tesche, 1996), noninvasive deep-brain stimulation techniques (Mohammadjavadi et al., 2022) or implanted electrode arrays (Krolak-Salmon et al., 2003)) may allow us to study these computations in more detail.
Finally, SSVEPs continue to provide a way of studying contrast in the cortex. Here, we are often interested in how different visual parameters interact. For example, how are signals from different eyes combined to generate both scalar contrast values and also binocular depth cues? How does contrast combination depend on the low level properties of the individual inputs such as their retinotopic location, cone contrast, spatiotemporal frequency, eye of origin and orientation? Colour vision scientists are particularly interested in how chromatic signals originating in a small number of cone-opponent retinal pathways are transformed into a perceptual colour space where the ‘unique hues’ appear to be only weakly-related to the early retinal outputs and how these computations are conditioned by the spatial and temporal properties of the scene (Gegenfurtner, 2003; Kaneko et al., 2020; Li et al., 2022; Solomon and Lennie, 2007; Stoughton and Conway, 2008; Wandell, 1993). All of these questions can be addressed by using SSVEP and frequency tagging to examine the computations that combine and transform the inputs at different cortical stages (Baker and Wade, 2017; Busse et al., 2009; Chen and Gegenfurtner, 2021; Katyal et al., 2018; Watts et al., 2024).
The original promise of the SSVEP approach was that the entire complex-valued frequency spectrum recorded at each location provided detailed information about the processing nonlinearities up to and including that point. In principle, this would allow researchers to uniquely fit the parameters of their computational models of visual processing (Regan and Regan, 1988). To date, the complexity of the neuronal computations at even the earliest stage of visual processing have hampered this effort - researchers typically restrict their analyses to the amplitudes of single, low-order frequency components (for example, the input frequencies or simple sums and differences of those frequencies). As our understanding of the early visual system improves, it is becoming possible to generate more realistic parameterised forward models of signal generation (Baker and Wade, 2017; Chariker et al., 2020; Groen et al., 2022; Schrimpf et al., 2020; Tsai et al., 2012). Feeding frequency-tagged inputs into these types of model allows us to generate synthetic SSVEP responses that can be compared with those measured in human subjects. In principle, we are therefore able to use the SSVEP to derive the parameters of early visual processing. This approach may also allow us to develop more sensitive tests for the changes in early visual processing that accompany a wide range of neurological diseases and disorders.